Providing improved packaging with lineage information for each individual bird.
The Data Schema Represents chicken and egg family trees
Chickens have:
Eggs have:
Chicken Genealogy:
Note: You will probably need more columns than just the above minimum information.
We need to recreate, (generate fake data) about all chickens currently on the farm (1000 chickens).
Some of whom are parents to others.
Generate the required 1000 records
What can you do to make these records seem as realistic as possible? (have realistic timelines and age) (Feel free to look up data as you need to, but tell us what you looked up?)
The Tag includes:
The Chicken's name
Their Favorite song
Their Parents
Their Grandparents
The Location each parent and grand parent was incubated
A randomly selected first cousin of the chicken
https://github.com/aruljohn/popular-baby-names/blob/master/2000/boy_names_2000.csv
https://github.com/fivethirtyeight/data/blob/master/classic-rock/classic-rock-song-list.csv
On March 11, 2020, the World Health Organization (WHO) declared COVID-19, the disease caused by the SARS-CoV-2, a pandemic. The announcement followed a rising sense of alarm in the preceding months over a new, potentially lethal virus that was swiftly spreading around the world.
https://www.thehappychickencoop.com/whats-the-perfect-ratio-of-hens-to-roosters/
https://www.typesofchicken.com/best-chickens-for-texas-humidity/
Breeds for Texas Best-egg-laying breed in Texas that doesn’t have the issues with humidity, heat, and fertility yet still lays pretty well, we suggest the Mediterranean sorts:
https://www.typesofchicken.com/keeping-ancona-chickens/
import pandas as pd
import datetime
from datetime import date
from calendar import Calendar, monthrange
from dateutil.rrule import rrule, DAILY
import uuid
import numpy as np
np.random.seed(1)
import random
#We will use duckdb for Metabase analyztics and visualization
import duckdb
# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
import warnings
warnings.filterwarnings('ignore')
import plotly.tools as tls
import plotly.express as px
import plotly.io as pio
## Paramanters
start_date = '03/11/2020'
start_hen_count =45
start_rooster_count = 1
breed = 'Anacona' # Anacona is heat tolerant and good for texas
color = 'Black&White' #They are Black with white specs
expected_egg_rate_min = 180
expected_egg_rate_max = 220
min_egg_producing_age = '20 weeks' #5 months
anacona_lifespan_min = '8 years'
anacona_lifespan_max = '12 years'
#Daily Log is the Audit Log of every day since the Farm began operation.
#Log can be used to track Month, High&Lo Temps, Humidity, Number of Sunlight Hours,
# Number of Eggs collected, Number of Eggs hatched, total number of chickens etc.
#Create initial daily log Pandas Dataframe
dailylog_df = pd.DataFrame()
#Initial Farm Operation Investment (4 hens & 1 rooster)
int_hen_list = [['Mary', 'Hen'], ['Pat', 'Hen'], ['Barb', 'Hen'], ['Liz', 'Hen'] ]
int_rooster_list = [['Ralph', 'Rooster']]
#Location
location = [['A', 'False'],['B', 'True'],['C', 'False']]
location_df = pd.DataFrame(location, columns = ['IH_Location', 'Near_Window'])
#Time delay from purchase to production as we purchased young chicks
delay_start = '1 Month or 5 weeks'
egg_production_start = '04/15/2020'
#Create Empyt Eggs Dataframe from list
egg_columns = ['E_ID', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW' ]
egg_df = pd.DataFrame(columns=['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW' ])
#Songs Table - additional attributes can be uese to plot and analyze
songs_df = pd.read_csv('Coop/songlist.csv')
hens_names = pd.read_csv('Coop/census-female-names.csv', usecols=[0], header=None)
rooster_names = pd.read_csv('Coop/boy_names_2000.csv', usecols=[1], header=None)
sex_list = ['Rooster', 'Hen']
rooster_names_list = rooster_names[1].tolist()
hen_names_list = hens_names[0].tolist()
#Convert Near_Window to Boolean Values
location_df['Near_Window'] = location_df['Near_Window'].map({'False':False, 'True':True})
#Egg Production Rates
egg_rate_min = expected_egg_rate_min / 365
egg_rate_max = expected_egg_rate_max / 365
print("min_rate/day = ", egg_rate_min, "\nmax_rate/day = ", egg_rate_max)
min_rate/day = 0.4931506849315068 max_rate/day = 0.6027397260273972
#Create initial Chicken Dataframe from investment:
chicken_list = int_hen_list + int_rooster_list
df = pd.DataFrame (chicken_list, columns = ['Name', 'Sex'])
df['Color'] = 'Black&White'
#df
#Egg Rate Per Chicken (Rooster = 0)
#egg_rate_min egg_rate_max
#Randomly Assign Egg Rate to Hens in Base Family
rate = np.random.randint(expected_egg_rate_min, expected_egg_rate_max, size=4)
df['egg_rate_per_year'] = pd.DataFrame(rate, columns=['egg_rate'])
The initial investment, Flock Size consisted of 4 Hens and 1 Rooster
df.head(5)
| Name | Sex | Color | egg_rate_per_year | |
|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 217.0 |
| 1 | Pat | Hen | Black&White | 192.0 |
| 2 | Barb | Hen | Black&White | 188.0 |
| 3 | Liz | Hen | Black&White | 189.0 |
| 4 | Ralph | Rooster | Black&White | NaN |
#Map initial Family to unique Chicken_IDs
#generate the number of IDs that are needed to be applied to each bird with a unique ID
names = df['Name'].tolist()
# generte Unique ids
ids = np.random.randint(low=1e6, high=1e9, size = len(names))
# maps ids to names
maps = {k:v for k,v in zip(names, ids)}
# add new id column
df['C_ID'] = df[['Name']].agg(' '.join, 1).map(maps)
#Add Generation to Table - Initial_Flock, Gen1, Gen2, Gen3, Gen4
#The Generation column will help us track the operation growth and family, etc.
#Assign Initial_Flock to our initila flock birds
df['Generation'] = "Initial_Flock"
#Uncomment to show Birds with Names and IDs
df
| Name | Sex | Color | egg_rate_per_year | C_ID | Generation | |
|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 217.0 | 225766667 | Initial_Flock |
| 1 | Pat | Hen | Black&White | 192.0 | 631311759 | Initial_Flock |
| 2 | Barb | Hen | Black&White | 188.0 | 397591248 | Initial_Flock |
| 3 | Liz | Hen | Black&White | 189.0 | 630559425 | Initial_Flock |
| 4 | Ralph | Rooster | Black&White | NaN | 800981516 | Initial_Flock |
#Calculate end-date using today's date
end_date = date.today()
today = date.today()
# initializing the start and end date
start_date = date(2020, 4, 15)
end_date = date(2020, 3, 31)
# iterating over the dates
#for d in rrule(DAILY, dtstart=start_date, until=end_date):
# print(d.strftime("%Y-%m-%d"))
Use pandas to Iterate through a range of dates
use the Pandas date_range() function method. It returns a fixed frequency DatetimeIndex.
Syntax: pandas.date_range(start, end)
Parameter:
start is the starting date
end is the ending date
# specify the start date is 2021 jan 1 st
# specify the end date is 2021 feb 1 st
dailylog_df['Dates']= pd.date_range(start='04/15/2020', end=today)
df['diff_years'] = (dailylog_df.iloc[-1]['Dates'] - dailylog_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
df['egg_count'] = df['egg_rate_per_year'] * df['diff_years']
print("The total number of Farm Operations Days to date:", dailylog_df.shape[0])
The total number of Farm Operations Days to date: 1050
This provides an expectation on the number of total eggs from the Initial Flock since the start of Farm Operations
df.head()
| Name | Sex | Color | egg_rate_per_year | C_ID | Generation | diff_years | egg_count | |
|---|---|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 217.0 | 225766667 | Initial_Flock | 2.872064 | 623.237986 |
| 1 | Pat | Hen | Black&White | 192.0 | 631311759 | Initial_Flock | 2.872064 | 551.436374 |
| 2 | Barb | Hen | Black&White | 188.0 | 397591248 | Initial_Flock | 2.872064 | 539.948117 |
| 3 | Liz | Hen | Black&White | 189.0 | 630559425 | Initial_Flock | 2.872064 | 542.820181 |
| 4 | Ralph | Rooster | Black&White | NaN | 800981516 | Initial_Flock | 2.872064 | NaN |
#Round Egg Count down as egg counts have to be a whole number, Int.
df['egg_count_round'] = df['egg_count'].apply(np.floor)
#Create a seperate Hens and Rooster Table for reporting and Management.
hens_df = df[df['Sex'].str.match('Hen')]
roosters_df = df[df['Sex'].str.match('Rooster')]
#Convert the egg_count from Floats to Integers
hens_df['egg_count_round'] = hens_df['egg_count_round'].astype(int)
#Generate the toal Egg expectancy from the Initial Flock which will be used to generate the Eggs Table.
print("We expect the initial Flock has produced the following number of Eggs by this time:")
hens_df['egg_count_round'].sum()
We expect the initial Flock has produced the following number of Eggs by this time:
2255
#Uncommend to show the current Hens Table.
#hens_df
print("The current Egg Table contains the following Columns:")
egg_df.columns
The current Egg Table contains the following Columns:
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW'], dtype='object')
print("The current Location Table contains the following Columns:")
location_df.columns
The current Location Table contains the following Columns:
Index(['IH_Location', 'Near_Window'], dtype='object')
#Uncomment to show the current Rooster Table:
#roosters_df
print("Looping through each Hen and their associated egg counts per Hen to generate the Egg Table")
#eggs_df = pd.DataFrame()
#convert location table to a list to radomonly assign egg location - egg can only have 1 of three locations based on assumptions.
location_list = location_df['IH_Location'].tolist()
number_of_samples = 1
NW = 'False'
#Convert Rooster table to List to radomon aassign Rooster to Egg:
rooster_list = roosters_df['C_ID'].tolist()
for index, row in hens_df.iterrows():
eggs_per_hen = row['egg_count_round']
hen = row['Name']
PH_ID = row['C_ID']
#print(row['egg_count_round'])
print( row['Name'], eggs_per_hen)
for i in range(eggs_per_hen):
#Create a new random Egg ID - E_ID
# generte random integer ids
# generate egg id
E_ID = np.random.randint(low=1e3, high=1e9)
#generate Parent Rooster ID - PR_ID
PR_ID = random.choices(population=rooster_list, k=number_of_samples)
#Generate Location and assign to each egg
IH_LOC = random.choices(population=location_list, k=number_of_samples)
#Append each egg to the Egg Table:
egg_df = egg_df.append(pd.DataFrame({'E_ID': E_ID, 'PH_Name': hen, 'PH_ID': PH_ID, 'PR_ID': PR_ID, 'IH_LOC': IH_LOC, 'NW': NW}, index=[0]), ignore_index=True)
Looping through each Hen and their associated egg counts per Hen to generate the Egg Table Mary 623 Pat 551 Barb 539 Liz 542
#Add Parent_Gen Column to the Egg Table: The initial entry will be Initial_Flock, Gen1, Gen2, Gen3, Gen4
#The Parent_Gen column will help track the operation growth and family, etc.
#Assign Initial_Flock to our initila flock birds
egg_df['Parent_Gen'] = "Initial_Flock"
print("The Inital Flock has generated", egg_df.shape[0] ,"total eggs and added to the Egg Table")
The Inital Flock has generated 2255 total eggs and added to the Egg Table
print("The total number of unique Egg IDs:", egg_df['E_ID'].nunique())
The total number of unique Egg IDs: 2255
egg_df.head(2)
| E_ID | PH_Name | PH_ID | PR_ID | IH_LOC | NW | Parent_Gen | |
|---|---|---|---|---|---|---|---|
| 0 | 592323119 | Mary | 225766667 | 800981516 | C | False | Initial_Flock |
| 1 | 410431189 | Mary | 225766667 | 800981516 | C | False | Initial_Flock |
#Check for duplicate IDs
egg_df[egg_df.duplicated(['E_ID'], keep=False)]
| E_ID | PH_Name | PH_ID | PR_ID | IH_LOC | NW | Parent_Gen |
|---|
#Set Location values for Window Location based on IH_LOC == B
egg_df.loc[(egg_df['IH_LOC'] == 'B'), 'NW'] = True
ev_df = egg_df['PH_Name'].value_counts().rename_axis('Hen').reset_index(name='egg_count')
ev_df
| Hen | egg_count | |
|---|---|---|
| 0 | Mary | 623 |
| 1 | Pat | 551 |
| 2 | Liz | 542 |
| 3 | Barb | 539 |
#Plot Sensor Execution Counts per hour
fig = px.scatter(ev_df, x='Hen', y='egg_count',
template = 'none',
color = 'egg_count',
size = 'egg_count',
title=f"Egg count per hen")
pio.write_html(fig, f'totaleggs.html')
fig.show()
df.columns
Index(['Name', 'Sex', 'Color', 'egg_rate_per_year', 'C_ID', 'Generation',
'diff_years', 'egg_count', 'egg_count_round'],
dtype='object')
# Initial Flock Chickens Table
chickens_df = df[['Name', 'Sex', 'Color', 'C_ID', 'Generation']]
#Assign NaN values to First Generation Chickens for E_ID, PH_Name, PH_ID, PR_ID
songs_df.head(5)
| Song Clean | ARTIST CLEAN | Release Year | COMBINED | First? | Year? | PlayCount | F*G | |
|---|---|---|---|---|---|---|---|---|
| 0 | Caught Up in You | .38 Special | 1982 | Caught Up in You by .38 Special | 1 | 1 | 82 | 82 |
| 1 | Fantasy Girl | .38 Special | NaN | Fantasy Girl by .38 Special | 1 | 0 | 3 | 0 |
| 2 | Hold On Loosely | .38 Special | 1981 | Hold On Loosely by .38 Special | 1 | 1 | 85 | 85 |
| 3 | Rockin' Into the Night | .38 Special | 1980 | Rockin' Into the Night by .38 Special | 1 | 1 | 18 | 18 |
| 4 | Art For Arts Sake | 10cc | 1975 | Art For Arts Sake by 10cc | 1 | 1 | 1 | 1 |
#Create Song list and assign to Chickens
Generation = "Initial_Flock"
song_list = songs_df['Song Clean'].to_list()
chickens_df["Favorite_Song"] = np.random.choice(song_list, size=len(chickens_df))
#chickens_df['Genration'] = Generation
chickens_df
| Name | Sex | Color | C_ID | Generation | Favorite_Song | |
|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 225766667 | Initial_Flock | Paradise By The Dashboard Light |
| 1 | Pat | Hen | Black&White | 631311759 | Initial_Flock | With a Little Luck |
| 2 | Barb | Hen | Black&White | 397591248 | Initial_Flock | Walk On The Wild Side |
| 3 | Liz | Hen | Black&White | 630559425 | Initial_Flock | Wild Horses |
| 4 | Ralph | Rooster | Black&White | 800981516 | Initial_Flock | Pride of Man |
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
| Name | Sex | Color | C_ID | Generation | Favorite_Song |
|---|
eggconversion_ratio = 0.25
hens_df
| Name | Sex | Color | egg_rate_per_year | C_ID | Generation | diff_years | egg_count | egg_count_round | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 217.0 | 225766667 | Initial_Flock | 2.872064 | 623.237986 | 623 |
| 1 | Pat | Hen | Black&White | 192.0 | 631311759 | Initial_Flock | 2.872064 | 551.436374 | 551 |
| 2 | Barb | Hen | Black&White | 188.0 | 397591248 | Initial_Flock | 2.872064 | 539.948117 | 539 |
| 3 | Liz | Hen | Black&White | 189.0 | 630559425 | Initial_Flock | 2.872064 | 542.820181 | 542 |
#Calculate Selection Rate to hatch per Hen
first_window = 30 * 6 * 0.4
#time = 6 months or egg_rate_per_year / 2
first_gen_selection = hens_df[['Name', 'Sex', 'egg_rate_per_year', 'C_ID', 'Generation']]
first_gen_selection['Counts'] = (first_gen_selection['egg_rate_per_year'] / 2 * 0.27).apply(np.floor)
#df['egg_count_round'] = df['egg_count'].apply(np.floor)
first_gen_selection
| Name | Sex | egg_rate_per_year | C_ID | Generation | Counts | |
|---|---|---|---|---|---|---|
| 0 | Mary | Hen | 217.0 | 225766667 | Initial_Flock | 29.0 |
| 1 | Pat | Hen | 192.0 | 631311759 | Initial_Flock | 25.0 |
| 2 | Barb | Hen | 188.0 | 397591248 | Initial_Flock | 25.0 |
| 3 | Liz | Hen | 189.0 | 630559425 | Initial_Flock | 25.0 |
#Select Eggs from Hens - 53% hens - 47% Roosters
first_gen_slicer = first_gen_selection[['Name', 'C_ID', 'Counts']]
first_gen_slicer['Counts'] = first_gen_slicer['Counts'].astype(int)
egg_df.columns
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW', 'Parent_Gen'], dtype='object')
chicken_first = pd.DataFrame()
for index, row in first_gen_slicer.iterrows():
chicks_per_hen = row['Counts']
hen = row['Name']
PH_ID = row['C_ID']
print( row['Name'], chicks_per_hen)
# Creating a second dataframe that will be the subset of main dataframe
#print("Second data frame")
dataframe_first = egg_df[['E_ID', 'PH_Name', 'PH_ID', 'PR_ID']].sample(n=chicks_per_hen)
chicken_first = chicken_first.append(dataframe_first)
chicken_first['Generation'] = 'Gen1'
chicken_first['Color'] = color
chicken_first['C_ID'] = chicken_first['E_ID']
Mary 29 Pat 25 Barb 25 Liz 25
#chicken_first
#Assign Favorite Song to First Gen Chickens
chicken_first["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_first))
#Assign Sex to First Gen Chickents
chicken_first['Sex'] = np.random.choice(sex_list, size=len(chicken_first))
first_gen_rooster_count = chicken_first[chicken_first['Sex'] == "Rooster"].shape[0]
first_gen_rooster = chicken_first[chicken_first['Sex'] == "Rooster"]
first_gen_hen_count = chicken_first[chicken_first['Sex'] == "Hen"].shape[0]
first_gen_hen = chicken_first[chicken_first['Sex'] == "Hen"]
print("The first Geration has produced:")
print("================================")
print(first_gen_rooster_count, "Roosters")
print(first_gen_hen_count, "Hens")
#Gerate the an appropriate nummber of Rooster Names for Fist Generation and assign
first_gen_rooster_names = random.sample(rooster_names_list, first_gen_rooster_count)
first_gen_rooster['Name'] = np.random.choice(rooster_names_list, size=len(first_gen_rooster))
#Generate the appropriate number/list of Hens Names for First Generation and assign
first_gen_hens_names = random.sample(hen_names_list, first_gen_hen_count)
first_gen_hen['Name'] = np.random.choice(hen_names_list, size=len(first_gen_hen))
first_gen_chickens = pd.concat([first_gen_hen, first_gen_rooster], axis=0)
The first Geration has produced: ================================ 55 Roosters 49 Hens
first_gen_chickens.head(5)
| E_ID | PH_Name | PH_ID | PR_ID | Generation | Color | C_ID | Favorite_Song | Sex | Name | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1134 | 884986770 | Pat | 631311759 | 800981516 | Gen1 | Black&White | 884986770 | Have a Drink On Me | Hen | GWENN |
| 850 | 744877899 | Pat | 631311759 | 800981516 | Gen1 | Black&White | 744877899 | And She Was | Hen | LOLITA |
| 358 | 59922539 | Mary | 225766667 | 800981516 | Gen1 | Black&White | 59922539 | Night Prowler | Hen | SHAROLYN |
| 1377 | 34072451 | Barb | 397591248 | 800981516 | Gen1 | Black&White | 34072451 | Hypnotized | Hen | RUFINA |
| 799 | 997044851 | Pat | 631311759 | 800981516 | Gen1 | Black&White | 997044851 | Limelight | Hen | FERMINA |
chickens_df.columns
chickens_df
| Name | Sex | Color | C_ID | Generation | Favorite_Song | |
|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 225766667 | Initial_Flock | Paradise By The Dashboard Light |
| 1 | Pat | Hen | Black&White | 631311759 | Initial_Flock | With a Little Luck |
| 2 | Barb | Hen | Black&White | 397591248 | Initial_Flock | Walk On The Wild Side |
| 3 | Liz | Hen | Black&White | 630559425 | Initial_Flock | Wild Horses |
| 4 | Ralph | Rooster | Black&White | 800981516 | Initial_Flock | Pride of Man |
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
| Name | Sex | Color | C_ID | Generation | Favorite_Song |
|---|
#Reorder first_gen_chickens dataframe columns to match Initial Flock Table Format so we can Concatenate them
chickens_df.columns
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song'], dtype='object')
#Copy original chickens_df dataframe to initial_flock_chickens for record keeping
initial_flock_chickens = chickens_df
initial_flock_eggs = egg_df
first_gen_chickens.columns
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'Generation', 'Color', 'C_ID',
'Favorite_Song', 'Sex', 'Name'],
dtype='object')
first_gen_chickens = first_gen_chickens[['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
'PH_Name', 'PH_ID', 'PR_ID']]
#first_gen_chickens
#Combine Initial Flock Chickens and First Gen Chickens into Chickens Table
chickens_combined_df = pd.concat([chickens_df, first_gen_chickens], axis=0 )
chickens_combined_df.head(9)
| Name | Sex | Color | C_ID | Generation | Favorite_Song | E_ID | PH_Name | PH_ID | PR_ID | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 225766667 | Initial_Flock | Paradise By The Dashboard Light | NaN | NaN | NaN | NaN |
| 1 | Pat | Hen | Black&White | 631311759 | Initial_Flock | With a Little Luck | NaN | NaN | NaN | NaN |
| 2 | Barb | Hen | Black&White | 397591248 | Initial_Flock | Walk On The Wild Side | NaN | NaN | NaN | NaN |
| 3 | Liz | Hen | Black&White | 630559425 | Initial_Flock | Wild Horses | NaN | NaN | NaN | NaN |
| 4 | Ralph | Rooster | Black&White | 800981516 | Initial_Flock | Pride of Man | NaN | NaN | NaN | NaN |
| 1134 | GWENN | Hen | Black&White | 884986770 | Gen1 | Have a Drink On Me | 884986770 | Pat | 631311759 | 800981516 |
| 850 | LOLITA | Hen | Black&White | 744877899 | Gen1 | And She Was | 744877899 | Pat | 631311759 | 800981516 |
| 358 | SHAROLYN | Hen | Black&White | 59922539 | Gen1 | Night Prowler | 59922539 | Mary | 225766667 | 800981516 |
| 1377 | RUFINA | Hen | Black&White | 34072451 | Gen1 | Hypnotized | 34072451 | Barb | 397591248 | 800981516 |
print("The initial Flock Hens generated", egg_df.shape[0], "eggs and", first_gen_chickens.shape[0], "chickens")
The initial Flock Hens generated 2255 eggs and 104 chickens
chickens_df['C_ID'].nunique()
5
#Check for duplicate IDs
chickens_df[chickens_df.duplicated(['C_ID'], keep=False)]
| Name | Sex | Color | C_ID | Generation | Favorite_Song |
|---|
chickens_df.to_csv('Coop/gen_one_flock_df.csv')
#Assign Egg Rates to Hens from Gen 1
#Randomly Assign Egg Rate to Hens in Base Family
rate1 = np.random.randint(expected_egg_rate_min, expected_egg_rate_max, size=first_gen_hen_count)
first_gen_hen['egg_rate_per_year'] = np.random.choice(rate1, size=len(first_gen_hen))
#first_gen_hen.head(5)
#Create initial eggloggen1_df daily log Pandas Dataframe
eggloggen1_df = pd.DataFrame()
eggloggen1_df['Dates']= pd.date_range(start='10/15/2020', end=today)
eggloggen1_df['diff_years'] = (eggloggen1_df.iloc[-1]['Dates'] - eggloggen1_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
gen1_diff_years = (eggloggen1_df.iloc[-1]['Dates'] - eggloggen1_df.iloc[0]['Dates']) / np.timedelta64(1, 'Y')
first_gen_hen['egg_count'] = first_gen_hen['egg_rate_per_year'] * gen1_diff_years
#Round Egg Count down as egg counts have to be a whole number, Int.
first_gen_hen['egg_count_round'] = first_gen_hen['egg_count'].apply(np.floor)
#Convert the egg_count from Floats to Integers
first_gen_hen['egg_count_round'] = first_gen_hen['egg_count_round'].astype(int)
print("The total number of Farm Operations Days to date for Gen 1 Hens:", eggloggen1_df.shape[0])
The total number of Farm Operations Days to date for Gen 1 Hens: 867
#The roosters could be from the Original Flock (1) or the Gen1 Roosters
roosters_df = chickens_combined_df[chickens_combined_df.Sex == 'Rooster']
#The Hens for laying Gen 2 eggs would only include the Gen 1 Hens - first_gen_hen
#first_gen_hen.head(10)
first_gen_hen.columns
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'Generation', 'Color', 'C_ID',
'Favorite_Song', 'Sex', 'Name', 'egg_rate_per_year', 'egg_count',
'egg_count_round'],
dtype='object')
print("The number of potentail Roosters for Gen 2 is:", roosters_df.shape[0])
The number of potentail Roosters for Gen 2 is: 56
print("Looping through each Hen and their associated egg counts per Hen to generate the Egg Table")
#Second Generation of Eggs
eggs2_df = pd.DataFrame()
#Add Parent_Gen Column to the Egg Table: The is Gen1 generating Gen2 eggs
#The Parent_Gen column will help track the operation growth and family, etc.
#Assign Gen1 to the Gen 2 Egg routine
eggs2_df['Parent_Gen'] = "Gen1"
#convert location table to a list to radomonly assign egg location - egg can only have 1 of three locations based on assumptions.
location_list = location_df['IH_Location'].tolist()
number_of_samples = 1
NW = 'False'
#Convert Complete Rooster table to List to radomon aassign Rooster to Egg:
#Use Complete Rooster List
rooster_list = roosters_df['C_ID'].tolist()
#Iterate over each Gen1 Hen
for index, row in first_gen_hen.iterrows():
eggs_per_hen = row['egg_count_round']
hen = row['Name']
PH_ID = row['C_ID']
#print(row['egg_count_round'])
print( row['Name'], eggs_per_hen)
for i in range(eggs_per_hen):
#Create a new random Egg ID - E_ID
# generte random integer ids
# generate egg id
E_ID = np.random.randint(low=1e3, high=1e9)
Parent_Gen = "Gen1"
#generate Parent Rooster ID - PR_ID
PR_ID = random.choices(population=rooster_list, k=number_of_samples)
#Generate Location and assign to each egg
IH_LOC = random.choices(population=location_list, k=number_of_samples)
#Append each egg to the Egg Table:
eggs2_df = eggs2_df.append(pd.DataFrame({'E_ID': E_ID, 'PH_Name': hen, 'PH_ID': PH_ID, 'PR_ID': PR_ID, 'IH_LOC': IH_LOC, 'NW': NW, 'Parent_Gen': Parent_Gen }, index=[0]), ignore_index=True)
Looping through each Hen and their associated egg counts per Hen to generate the Egg Table GWENN 502 LOLITA 488 SHAROLYN 495 RUFINA 481 FERMINA 486 DEANE 483 MALORIE 462 HELLEN 471 AURA 481 RANA 483 ELFREDA 433 ALINA 483 ALEXANDRA 502 JONELL 476 PENELOPE 455 ZORAIDA 495 BARBIE 471 GIANNA 488 SONG 474 SHARLENE 500 CLARINE 474 JOHANA 476 YANG 448 ROXANNA 467 AKILAH 493 HWA 488 BARBRA 481 ALAYNA 457 SALLIE 505 OCTAVIA 483 RUBIE 497 AI 441 KARMA 512 MARLEEN 495 TOWANDA 500 AKILAH 438 DELIA 441 STEPHANIE 502 JOSIE 481 ILIANA 459 BRANDON 490 CONSUELO 502 ETHYL 502 ELLY 502 KANISHA 457 CATARINA 476 GRETCHEN 488 MARYAM 476 JANETTE 445
eggs2_df.head(3)
| Parent_Gen | E_ID | PH_Name | PH_ID | PR_ID | IH_LOC | NW | |
|---|---|---|---|---|---|---|---|
| 0 | Gen1 | 712701509.0 | GWENN | 884986770.0 | 287055622.0 | B | False |
| 1 | Gen1 | 571391458.0 | GWENN | 884986770.0 | 789856705.0 | A | False |
| 2 | Gen1 | 745110216.0 | GWENN | 884986770.0 | 287055622.0 | A | False |
eggs2_df.shape[0]
23485
#Check for duplicate IDs
eggs2_df[eggs2_df.duplicated(['E_ID'], keep=False)]
| Parent_Gen | E_ID | PH_Name | PH_ID | PR_ID | IH_LOC | NW |
|---|
#Set Location values for Window Location based on IH_LOC == B
eggs2_df.loc[(eggs2_df['IH_LOC'] == 'B'), 'NW'] = True
#eggs2_df.head(10)
#Provide a smaller rate than the first generation
gen2_eggconversion_ratio = 0.43
second_gen_selection = first_gen_hen[['Name', 'Sex', 'egg_rate_per_year', 'C_ID', 'Generation']]
second_gen_selection['Counts'] = (second_gen_selection['egg_rate_per_year'] / 2 * 0.23).apply(np.floor)
second_gen_selection['Counts'] = second_gen_selection['Counts'].astype(int)
#df['egg_count_round'] = df['egg_count'].apply(np.floor)
print("This selection ratio should provide the farm with", second_gen_selection['Counts'].sum(), "Gen 2 Chickens")
This selection ratio should provide the farm with 1121 Gen 2 Chickens
#second_gen_selection
second_gen_slicer = second_gen_selection[['Name', 'C_ID', 'Counts']]
#second_gen_slicer
#Check for duplicate IDs
eggs2_df[eggs2_df.duplicated(['E_ID'], keep=False)]
| Parent_Gen | E_ID | PH_Name | PH_ID | PR_ID | IH_LOC | NW |
|---|
chicken_second = pd.DataFrame()
for index, row in second_gen_slicer.iterrows():
chicks_per_hen = row['Counts']
hen = row['Name']
PH_ID = row['C_ID']
# Creating a second dataframe that will be the subset of main dataframe
#print("Second data frame")
dataframe_second = eggs2_df[['E_ID', 'PH_Name', 'PH_ID', 'PR_ID']].sample(n=chicks_per_hen, replace= False)
#Line to fix loop - Convert Egg ID (E_ID) to Chicken ID before appending to second generation dataframe
dataframe_second['C_ID'] =dataframe_second['E_ID']
chicken_second = chicken_second.append(dataframe_second)
chicken_second['Generation'] = 'Gen2'
chicken_second['Color'] = color
#chicken_second['C_ID'] = chicken_second['E_ID']
chicken_second.columns
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'C_ID', 'Generation', 'Color'], dtype='object')
#Check for duplicate IDs
chicken_second[chicken_second.duplicated(['E_ID'], keep=False)].sort_values("E_ID")
| E_ID | PH_Name | PH_ID | PR_ID | C_ID | Generation | Color | |
|---|---|---|---|---|---|---|---|
| 1321 | 32940298.0 | SHAROLYN | 59922539.0 | 890606190.0 | 32940298.0 | Gen2 | Black&White |
| 1321 | 32940298.0 | SHAROLYN | 59922539.0 | 890606190.0 | 32940298.0 | Gen2 | Black&White |
| 3785 | 62652878.0 | HELLEN | 905132962.0 | 789856705.0 | 62652878.0 | Gen2 | Black&White |
| 3785 | 62652878.0 | HELLEN | 905132962.0 | 789856705.0 | 62652878.0 | Gen2 | Black&White |
| 6789 | 134242878.0 | PENELOPE | 525986311.0 | 473017162.0 | 134242878.0 | Gen2 | Black&White |
| 6789 | 134242878.0 | PENELOPE | 525986311.0 | 473017162.0 | 134242878.0 | Gen2 | Black&White |
| 20796 | 163863830.0 | ELLY | 986027613.0 | 849068796.0 | 163863830.0 | Gen2 | Black&White |
| 20796 | 163863830.0 | ELLY | 986027613.0 | 849068796.0 | 163863830.0 | Gen2 | Black&White |
| 17668 | 218035193.0 | DELIA | 409344934.0 | 171198057.0 | 218035193.0 | Gen2 | Black&White |
| 17668 | 218035193.0 | DELIA | 409344934.0 | 171198057.0 | 218035193.0 | Gen2 | Black&White |
| 21565 | 225911866.0 | KANISHA | 1363281.0 | 473017162.0 | 225911866.0 | Gen2 | Black&White |
| 21565 | 225911866.0 | KANISHA | 1363281.0 | 473017162.0 | 225911866.0 | Gen2 | Black&White |
| 21194 | 291846482.0 | KANISHA | 1363281.0 | 167740021.0 | 291846482.0 | Gen2 | Black&White |
| 21194 | 291846482.0 | KANISHA | 1363281.0 | 167740021.0 | 291846482.0 | Gen2 | Black&White |
| 13166 | 378736202.0 | ALAYNA | 687466224.0 | 167740021.0 | 378736202.0 | Gen2 | Black&White |
| 13166 | 378736202.0 | ALAYNA | 687466224.0 | 167740021.0 | 378736202.0 | Gen2 | Black&White |
| 5153 | 385622112.0 | ELFREDA | 852945005.0 | 797726697.0 | 385622112.0 | Gen2 | Black&White |
| 5153 | 385622112.0 | ELFREDA | 852945005.0 | 797726697.0 | 385622112.0 | Gen2 | Black&White |
| 6330 | 505844466.0 | JONELL | 461131488.0 | 314063231.0 | 505844466.0 | Gen2 | Black&White |
| 6330 | 505844466.0 | JONELL | 461131488.0 | 314063231.0 | 505844466.0 | Gen2 | Black&White |
| 10191 | 595994296.0 | JOHANA | 919605639.0 | 119329546.0 | 595994296.0 | Gen2 | Black&White |
| 10191 | 595994296.0 | JOHANA | 919605639.0 | 119329546.0 | 595994296.0 | Gen2 | Black&White |
| 15771 | 647803701.0 | KARMA | 899514543.0 | 767163035.0 | 647803701.0 | Gen2 | Black&White |
| 15771 | 647803701.0 | KARMA | 899514543.0 | 767163035.0 | 647803701.0 | Gen2 | Black&White |
| 13581 | 650752941.0 | SALLIE | 763377021.0 | 30363405.0 | 650752941.0 | Gen2 | Black&White |
| 13581 | 650752941.0 | SALLIE | 763377021.0 | 30363405.0 | 650752941.0 | Gen2 | Black&White |
| 9126 | 718257024.0 | SHARLENE | 249446444.0 | 870351723.0 | 718257024.0 | Gen2 | Black&White |
| 9126 | 718257024.0 | SHARLENE | 249446444.0 | 870351723.0 | 718257024.0 | Gen2 | Black&White |
| 7167 | 793705703.0 | PENELOPE | 525986311.0 | 102634120.0 | 793705703.0 | Gen2 | Black&White |
| 7167 | 793705703.0 | PENELOPE | 525986311.0 | 102634120.0 | 793705703.0 | Gen2 | Black&White |
| 4624 | 801863461.0 | RANA | 923804214.0 | 167740021.0 | 801863461.0 | Gen2 | Black&White |
| 4624 | 801863461.0 | RANA | 923804214.0 | 167740021.0 | 801863461.0 | Gen2 | Black&White |
| 18920 | 803590029.0 | ILIANA | 598175413.0 | 797726697.0 | 803590029.0 | Gen2 | Black&White |
| 18920 | 803590029.0 | ILIANA | 598175413.0 | 797726697.0 | 803590029.0 | Gen2 | Black&White |
| 15185 | 841854364.0 | AI | 734213689.0 | 542838249.0 | 841854364.0 | Gen2 | Black&White |
| 15185 | 841854364.0 | AI | 734213689.0 | 542838249.0 | 841854364.0 | Gen2 | Black&White |
| 21392 | 853160964.0 | KANISHA | 1363281.0 | 200097845.0 | 853160964.0 | Gen2 | Black&White |
| 21392 | 853160964.0 | KANISHA | 1363281.0 | 200097845.0 | 853160964.0 | Gen2 | Black&White |
| 22849 | 898011103.0 | MARYAM | 683759644.0 | 518692123.0 | 898011103.0 | Gen2 | Black&White |
| 22849 | 898011103.0 | MARYAM | 683759644.0 | 518692123.0 | 898011103.0 | Gen2 | Black&White |
| 21743 | 912603148.0 | CATARINA | 468233022.0 | 762431730.0 | 912603148.0 | Gen2 | Black&White |
| 21743 | 912603148.0 | CATARINA | 468233022.0 | 762431730.0 | 912603148.0 | Gen2 | Black&White |
| 7258 | 931135376.0 | ZORAIDA | 830718116.0 | 866881069.0 | 931135376.0 | Gen2 | Black&White |
| 7258 | 931135376.0 | ZORAIDA | 830718116.0 | 866881069.0 | 931135376.0 | Gen2 | Black&White |
| 20276 | 943086742.0 | ETHYL | 45863615.0 | 120625354.0 | 943086742.0 | Gen2 | Black&White |
| 20276 | 943086742.0 | ETHYL | 45863615.0 | 120625354.0 | 943086742.0 | Gen2 | Black&White |
| 18636 | 952650159.0 | JOSIE | 379535793.0 | 167740021.0 | 952650159.0 | Gen2 | Black&White |
| 18636 | 952650159.0 | JOSIE | 379535793.0 | 167740021.0 | 952650159.0 | Gen2 | Black&White |
| 16871 | 977189628.0 | AKILAH | 527948599.0 | 890606190.0 | 977189628.0 | Gen2 | Black&White |
| 16871 | 977189628.0 | AKILAH | 527948599.0 | 890606190.0 | 977189628.0 | Gen2 | Black&White |
chicken_second['C_ID'].nunique()
1096
chicken_second.shape[0]
1121
#Check for duplicate IDs
#chicken_second[chicken_second.duplicated(['C_ID'], keep=False)].sort_values("C_ID")
#df = df.drop_duplicates('column_name', keep='last')
c2_clean_df = chicken_second.drop_duplicates('C_ID', keep='first')
c2_clean_df.shape[0]
1096
#Verify Eggs ID duplicate IDs are removed and
c2_clean_count = len(c2_clean_df[c2_clean_df.duplicated(['E_ID'], keep=False)].sort_values("E_ID"))
c2_clean_df[c2_clean_df.duplicated(['E_ID'], keep=False)].sort_values("E_ID")
| E_ID | PH_Name | PH_ID | PR_ID | C_ID | Generation | Color |
|---|
#Reassign if = 0
print(c2_clean_count)
if c2_clean_count == 0:
print("Re-assigning clean dataframe to second generation of Chickens")
chicken_second = c2_clean_df
else:
print("There are still some duplicate records")
0 Re-assigning clean dataframe to second generation of Chickens
#Assign Favorite Song to First Gen Chickens
chicken_second["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_second))
#Assign Sex to First Gen Chickents
chicken_second['Sex'] = np.random.choice(sex_list, size=len(chicken_second))
second_gen_rooster_count = chicken_second[chicken_second['Sex'] == "Rooster"].shape[0]
second_gen_rooster = chicken_second[chicken_second['Sex'] == "Rooster"]
second_gen_hen_count = chicken_second[chicken_second['Sex'] == "Hen"].shape[0]
second_gen_hen = chicken_second[chicken_second['Sex'] == "Hen"]
print("The Second Geration has produced:")
print("================================")
print(second_gen_rooster_count, "Roosters")
print(second_gen_hen_count, "Hens")
#Gerate the an appropriate nummber of Rooster Names for Fist Generation and assign
second_gen_rooster_names = random.sample(rooster_names_list, second_gen_rooster_count)
second_gen_rooster['Name'] = np.random.choice(rooster_names_list, size=len(second_gen_rooster))
#Generate the appropriate number/list of Hens Names for First Generation and assign
second_gen_hens_names = random.sample(hen_names_list, second_gen_hen_count)
second_gen_hen['Name'] = np.random.choice(hen_names_list, size=len(second_gen_hen))
second_gen_chickens = pd.concat([second_gen_hen, second_gen_rooster], axis=0)
The Second Geration has produced: ================================ 563 Roosters 533 Hens
second_gen_chickens.shape[0]
1096
second_gen_chickens.head(5)
| E_ID | PH_Name | PH_ID | PR_ID | C_ID | Generation | Color | Favorite_Song | Sex | Name | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1103 | 870011705.0 | SHAROLYN | 59922539.0 | 334058719.0 | 870011705.0 | Gen2 | Black&White | Wicked Sensation | Hen | LINNEA |
| 7127 | 221747441.0 | PENELOPE | 525986311.0 | 618028080.0 | 221747441.0 | Gen2 | Black&White | The Journey/It's Easy | Hen | SHARICE |
| 12599 | 263729520.0 | BARBRA | 832570811.0 | 650882381.0 | 263729520.0 | Gen2 | Black&White | Travelin' Man | Hen | EARLIE |
| 6072 | 928521689.0 | ALEXANDRA | 102582398.0 | 473017162.0 | 928521689.0 | Gen2 | Black&White | Fire Down Below | Hen | GITA |
| 23289 | 992321108.0 | JANETTE | 428335380.0 | 34072451.0 | 992321108.0 | Gen2 | Black&White | Houseparty | Hen | JERLENE |
#Combine Initial Flock Chickens and First Gen Chickens into Chickens Table
chickens_total_df = pd.concat([chickens_combined_df, second_gen_chickens], axis=0 )
chickens_total_df.shape[0]
1205
chickens_total_df.columns
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
'PH_Name', 'PH_ID', 'PR_ID'],
dtype='object')
chickens_total_df['C_ID'].nunique()
1203
#c2_clean_df = chicken_second.drop_duplicates('C_ID', keep='first')
chickens_total_df = chickens_total_df.drop_duplicates('C_ID', keep='first')
#Verify Duplicates are removed
chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]
| Name | Sex | Color | C_ID | Generation | Favorite_Song | E_ID | PH_Name | PH_ID | PR_ID |
|---|
chickens_total_df.to_csv('Coop/chickens_total_df.csv')
#### Combine Both Generation Eggs into Complete Table
eggs_total_df = pd.concat([egg_df, eggs2_df], axis=0)
eggs_total_df.shape[0]
25740
eggs_total_df.to_csv('Coop/eggs_total_df.csv')
chickens_total_df.columns
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
'PH_Name', 'PH_ID', 'PR_ID'],
dtype='object')
chickens_total_df.shape[0]
1203
ctags_df = chickens_total_df[['Name', 'Favorite_Song', 'C_ID', 'E_ID', 'PH_Name', 'PH_ID', 'PR_ID']]
#Rename PH_Name coloumn to Mother - To provide cleanar data
ctags_df.rename({'PH_Name': 'Mother'}, axis=1, inplace=True)
etags_df = eggs_total_df
etags_df.columns
Index(['E_ID', 'PH_Name', 'PH_ID', 'PR_ID', 'IH_LOC', 'NW', 'Parent_Gen'], dtype='object')
itag_df = etags_df[['E_ID', 'IH_LOC']]
#Merge Chickens with Eggs to map Chickens to Eggs.
tags_df = pd.merge(ctags_df, etags_df, how='outer', on = 'E_ID' , suffixes=['_Chicken','_Egg'])
tags_df.head(7)
| Name | Favorite_Song | C_ID | E_ID | Mother | PH_ID_Chicken | PR_ID_Chicken | PH_Name | PH_ID_Egg | PR_ID_Egg | IH_LOC | NW | Parent_Gen | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mary | Paradise By The Dashboard Light | 225766667 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | Pat | With a Little Luck | 631311759 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | Barb | Walk On The Wild Side | 397591248 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | Liz | Wild Horses | 630559425 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | Ralph | Pride of Man | 800981516 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | GWENN | Have a Drink On Me | 884986770 | 884986770.0 | Pat | 631311759 | 800981516 | Pat | 631311759 | 800981516 | B | True | Initial_Flock |
| 6 | LOLITA | And She Was | 744877899 | 744877899.0 | Pat | 631311759 | 800981516 | Pat | 631311759 | 800981516 | C | False | Initial_Flock |
#tags_df.shape[0]
rooster_total_df = chickens_total_df[chickens_total_df['Sex'] == 'Rooster']
rooster_total_df.shape[0]
617
#rooster_total_df.columns
rooster_tags_df = rooster_total_df[['Name', 'C_ID', 'Generation', 'E_ID', 'PH_Name', 'PH_ID', 'PR_ID']]
#twr_df: Tags with Rooster Data
twr_df = pd.merge(tags_df, rooster_tags_df, how='inner', left_on='PR_ID_Chicken', right_on='C_ID', suffixes=['_t', '_Father'])
twr_df.shape[0]
1174
#twr_df.head(5)
twr_df.columns
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother',
'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
'IH_LOC', 'NW', 'Parent_Gen', 'Name_Father', 'C_ID_Father',
'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID'],
dtype='object')
twr_df[['Name_t', 'Favorite_Song', 'Mother', 'Name_Father']].head(5)
| Name_t | Favorite_Song | Mother | Name_Father | |
|---|---|---|---|---|
| 0 | GWENN | Have a Drink On Me | Pat | Ralph |
| 1 | LOLITA | And She Was | Pat | Ralph |
| 2 | SHAROLYN | Night Prowler | Mary | Ralph |
| 3 | RUFINA | Hypnotized | Barb | Ralph |
| 4 | FERMINA | Limelight | Pat | Ralph |
#Export Tags with Roosers to CSV file for data analysis
twr_df.to_csv('Coop/twr_df.csv')
#twr_df.columns
#ctags_df.columns
#Chicken Grand Parents Tags Datafrmae to use for merging so we have less data in the dataframe to merge
cgptags_df = ctags_df[['Name', 'Mother', 'C_ID', 'E_ID']]
#cgptags_df = ctags_df[['Name', 'C_ID', 'E_ID']]
#tags_with_rooster (Father) and GrandMother - twrgm_df
twrgm_df = pd.merge(twr_df, cgptags_df, how='outer', left_on='PH_ID_Chicken', right_on='C_ID', suffixes=['twr', '_Maternal_Mother'])
#tags_with_rooster_grandmother_and_grandfather
twrgf_df = pd.merge(twrgm_df, cgptags_df, how='outer', left_on='PR_ID', right_on='C_ID', suffixes=['CMM', '_Maternal_Father'])
twrgf_df.columns
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mothertwr',
'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
'IH_LOC', 'NW', 'Parent_Gen', 'Name_Father', 'C_ID_Father',
'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID',
'NameCMM', 'Mother_Maternal_Mother', 'C_IDCMM', 'E_IDCMM',
'Name_Maternal_Father', 'Mother', 'C_ID_Maternal_Father',
'E_ID_Maternal_Father'],
dtype='object')
#Rename PH_Name coloumn to Mother - To provide cleanar data
twrgf_df.rename({'Mother_Maternal_Mother': 'GrandMother'}, axis=1, inplace=True)
twrgf_df.rename({'Name_Maternal_Father': 'GrandFather'}, axis=1, inplace=True)
twrgf_df.rename({'Name_Father': 'Father'}, axis=1, inplace=True)
#twrgf_df['IH_LOC'].head(3)
twrgf_df[['GrandMother', 'GrandFather', 'Father']].head(5)
| GrandMother | GrandFather | Father | |
|---|---|---|---|
| 0 | NaN | NaN | Ralph |
| 1 | NaN | NaN | Ralph |
| 2 | NaN | NaN | Ralph |
| 3 | NaN | NaN | Ralph |
| 4 | NaN | NaN | Ralph |
#itag_df.columns
EIDs = [ 'E_ID_', 'PH_ID_Egg', 'PR_ID_Egg', 'E_ID_Father', 'E_ID_Maternal_Mother', 'E_ID']
#twrgd2_loc_df.columns
#Merge in Incubation Location for Mother, Father, GrandMother and GrandFather using Egg Table - E_IDs
twrgd_loc_df = pd.merge(twrgf_df, itag_df, left_on=['PH_ID_Chicken'], right_on = ['E_ID'], how='inner', suffixes=['Chicken', '_MotherIL'])
twrgd1_loc_df = pd.merge(twrgd_loc_df, itag_df, left_on=['PR_ID_Egg'], right_on = ['E_ID'], how='inner', suffixes=['CM', '_FatherIL'])
twrgd2_loc_df = pd.merge(twrgd1_loc_df, itag_df, left_on=['E_IDCMM'], right_on = ['E_ID'], how='inner', suffixes=['CMF', '_GrandMother_IL'])
twrgd3_loc_df = pd.merge(twrgd2_loc_df, itag_df, left_on=['E_ID'], right_on = ['E_ID'], how='inner', suffixes=['C', '_GrandFather_IL'])
#Analyze existing Column Names
twrgd3_loc_df.columns
Index(['Name_t', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mothertwr',
'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
'IH_LOCChicken', 'NW', 'Parent_Gen', 'Father', 'C_ID_Father',
'Generation', 'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID',
'NameCMM', 'GrandMother', 'C_IDCMM', 'E_IDCMM', 'GrandFather', 'Mother',
'C_ID_Maternal_Father', 'E_ID_Maternal_Father', 'E_IDCM',
'IH_LOC_MotherIL', 'E_ID_FatherIL', 'IH_LOCCMF', 'E_ID',
'IH_LOC_GrandMother_IL', 'IH_LOC'],
dtype='object')
int_tags_df = twrgd3_loc_df.drop('Mother', axis=1)
int_tags_df.head(3)
| Name_t | Favorite_Song | C_ID_t | E_ID_t | Mothertwr | PH_ID_Chicken | PR_ID_Chicken | PH_Name_t | PH_ID_Egg | PR_ID_Egg | ... | GrandFather | C_ID_Maternal_Father | E_ID_Maternal_Father | E_IDCM | IH_LOC_MotherIL | E_ID_FatherIL | IH_LOCCMF | E_ID | IH_LOC_GrandMother_IL | IH_LOC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | 35791117.0 | 35791117.0 | ILIANA | 598175413.0 | 473017162.0 | ILIANA | 598175413.0 | 473017162.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 473017162 | B | 598175413 | B | B |
| 1 | Luca | Women | 988498811.0 | 988498811.0 | ILIANA | 598175413.0 | 890606190.0 | ILIANA | 598175413.0 | 890606190.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 890606190 | B | 598175413 | B | B |
| 2 | MANDIE | Mistral Wind | 282876868.0 | 282876868.0 | ILIANA | 598175413.0 | 773595312.0 | ILIANA | 598175413.0 | 773595312.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 773595312 | B | 598175413 | B | B |
3 rows × 34 columns
int_tags_df[['IH_LOCChicken',
'Parent_Gen','Mothertwr', 'Father',
'GrandMother',
'GrandFather',
'IH_LOC_MotherIL', 'IH_LOCCMF',
'IH_LOC_GrandMother_IL', 'IH_LOC']].head(5)
| IH_LOCChicken | Parent_Gen | Mothertwr | Father | GrandMother | GrandFather | IH_LOC_MotherIL | IH_LOCCMF | IH_LOC_GrandMother_IL | IH_LOC | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | C | Gen1 | ILIANA | Derrick | Liz | Ralph | B | B | B | B |
| 1 | A | Gen1 | ILIANA | Bobby | Liz | Ralph | B | B | B | B |
| 2 | C | Gen1 | ILIANA | Easton | Liz | Ralph | B | B | B | B |
| 3 | B | Gen1 | ILIANA | Billy | Liz | Ralph | B | B | B | B |
| 4 | A | Gen1 | ILIANA | Billy | Liz | Ralph | B | B | B | B |
#Rename PH_Name coloumn to Mother - To provide cleanar data
twrgd3_loc_df.rename({'Mothertwr': 'Mother'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'Nametwr': 'Name'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOCChicken': 'I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC_MotherIL': 'Mother_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOCCMF': 'Father_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC_GrandMother_IL': 'GrandMother_I_LOC'}, axis=1, inplace=True)
twrgd3_loc_df.rename({'IH_LOC': 'GrandFather_I_LOC'}, axis=1, inplace=True)
#Rename PH_Name coloumn to Mother - To provide cleanar data
int_tags_df.rename({'Mothertwr': 'Mother'}, axis=1, inplace=True)
int_tags_df.rename({'Name_t': 'Name'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOCChicken': 'I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC_MotherIL': 'Mother_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOCCMF': 'Father_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC_GrandMother_IL': 'GrandMother_I_LOC'}, axis=1, inplace=True)
int_tags_df.rename({'IH_LOC': 'GrandFather_I_LOC'}, axis=1, inplace=True)
int_tags_df.head(5)
| Name | Favorite_Song | C_ID_t | E_ID_t | Mother | PH_ID_Chicken | PR_ID_Chicken | PH_Name_t | PH_ID_Egg | PR_ID_Egg | ... | GrandFather | C_ID_Maternal_Father | E_ID_Maternal_Father | E_IDCM | Mother_I_LOC | E_ID_FatherIL | Father_I_LOC | E_ID | GrandMother_I_LOC | GrandFather_I_LOC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | 35791117.0 | 35791117.0 | ILIANA | 598175413.0 | 473017162.0 | ILIANA | 598175413.0 | 473017162.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 473017162 | B | 598175413 | B | B |
| 1 | Luca | Women | 988498811.0 | 988498811.0 | ILIANA | 598175413.0 | 890606190.0 | ILIANA | 598175413.0 | 890606190.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 890606190 | B | 598175413 | B | B |
| 2 | MANDIE | Mistral Wind | 282876868.0 | 282876868.0 | ILIANA | 598175413.0 | 773595312.0 | ILIANA | 598175413.0 | 773595312.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 773595312 | B | 598175413 | B | B |
| 3 | MILAGRO | Bell Bottom Blues | 461050433.0 | 461050433.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B |
| 4 | TOBI | Still of the Night | 29249976.0 | 29249976.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | Ralph | 800981516 | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B |
5 rows × 34 columns
int_tags_df.to_csv('Coop/tags.csv')
int_tags_df.columns
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
dtype='object')
int_tags_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC' ]].head(6)
| Name | Favorite_Song | Mother | Father | GrandMother | GrandFather | I_LOC | Mother_I_LOC | Father_I_LOC | GrandMother_I_LOC | GrandFather_I_LOC | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | ILIANA | Derrick | Liz | Ralph | C | B | B | B | B |
| 1 | Luca | Women | ILIANA | Bobby | Liz | Ralph | A | B | B | B | B |
| 2 | MANDIE | Mistral Wind | ILIANA | Easton | Liz | Ralph | C | B | B | B | B |
| 3 | MILAGRO | Bell Bottom Blues | ILIANA | Billy | Liz | Ralph | B | B | B | B | B |
| 4 | TOBI | Still of the Night | ILIANA | Billy | Liz | Ralph | A | B | B | B | B |
| 5 | CATHERIN | My Own Worst Enemy | ILIANA | Billy | Liz | Ralph | B | B | B | B | B |
int_tags_df.shape[0]
1050
grand = int_tags_df[twrgf_df['Parent_Gen'] == 'Gen1']
grand.columns
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
dtype='object')
grand[['Name', 'Mother',
'Parent_Gen', 'Father',
'GrandMother', 'GrandFather']].head(6)
| Name | Mother | Parent_Gen | Father | GrandMother | GrandFather | |
|---|---|---|---|---|---|---|
| 102 | Jerome | ETHYL | Gen1 | Anderson | Barb | Ralph |
| 103 | Harley | ETHYL | Gen1 | Kameron | Barb | Ralph |
| 104 | Tevin | ETHYL | Gen1 | Reid | Barb | Ralph |
| 105 | Vaughn | ETHYL | Gen1 | Reid | Barb | Ralph |
| 106 | Isaias | ETHYL | Gen1 | Simeon | Barb | Ralph |
| 107 | AMI | ETHYL | Gen1 | Randy | Barb | Ralph |
int_tags_df['GrandMother'].nunique()
4
cousins_df = int_tags_df.groupby('GrandMother')
cousins_df.first()
| Name | Favorite_Song | C_ID_t | E_ID_t | Mother | PH_ID_Chicken | PR_ID_Chicken | PH_Name_t | PH_ID_Egg | PR_ID_Egg | ... | GrandFather | C_ID_Maternal_Father | E_ID_Maternal_Father | E_IDCM | Mother_I_LOC | E_ID_FatherIL | Father_I_LOC | E_ID | GrandMother_I_LOC | GrandFather_I_LOC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GrandMother | |||||||||||||||||||||
| Barb | Mason | Harmony | 160915678.0 | 160915678.0 | ETHYL | 45863615.0 | 473017162.0 | ETHYL | 45863615.0 | 473017162.0 | ... | Ralph | 800981516 | None | 45863615 | C | 473017162 | B | 45863615 | C | C |
| Liz | Johan | Gone Gone Gone | 35791117.0 | 35791117.0 | ILIANA | 598175413.0 | 473017162.0 | ILIANA | 598175413.0 | 473017162.0 | ... | Ralph | 800981516 | None | 598175413 | B | 473017162 | B | 598175413 | B | B |
| Mary | GITA | Fire Down Below | 928521689.0 | 928521689.0 | ALEXANDRA | 102582398.0 | 473017162.0 | ALEXANDRA | 102582398.0 | 473017162.0 | ... | Ralph | 800981516 | None | 102582398 | A | 473017162 | B | 102582398 | A | A |
| Pat | Tyree | American Pie | 222255258.0 | 222255258.0 | AKILAH | 527948599.0 | 473017162.0 | AKILAH | 527948599.0 | 473017162.0 | ... | Ralph | 800981516 | None | 527948599 | B | 473017162 | B | 527948599 | B | B |
4 rows × 33 columns
barb_family_list = cousins_df.get_group('Barb')['Name'].tolist()
liz_family_list = cousins_df.get_group('Liz')['Name'].tolist()
mary_family_list = cousins_df.get_group('Mary')['Name'].tolist()
pat_family_list = cousins_df.get_group('Pat')['Name'].tolist()
size=len(barb_family_list)
size
162
#type(#Assign Favorite Song to First Gen Chickens
#chicken_first["Favorite_Song"] = np.random.choice(song_list, size=len(chicken_first)))
# df['b'] = np.where(df.a.values == 0, np.nan, df.b.values)
barb_df = int_tags_df[int_tags_df['GrandMother'] == 'Barb']
liz_df = int_tags_df[int_tags_df['GrandMother'] == 'Liz']
mary_df = int_tags_df[int_tags_df['GrandMother'] == 'Mary']
pat_df = int_tags_df[int_tags_df['GrandMother'] == 'Pat']
barb_df['Cousin'] = np.random.choice(barb_family_list, size=len(barb_family_list))
liz_df['Cousin'] = np.random.choice(liz_family_list, size=len(liz_family_list))
mary_df['Cousin'] = np.random.choice(mary_family_list, size=len(mary_family_list))
pat_df['Cousin'] = np.random.choice(pat_family_list, size=len(pat_family_list))
#int_tags_df['Cousin'] = np.where(int_tags_df['GrandMother'] == 'Barb'), np.random.choice(barb_family_list, size=len(barb_family_list))
barb_c_df = barb_df[['Name', 'C_ID_t', 'Cousin']]
liz_c_df = liz_df[['Name', 'C_ID_t', 'Cousin']]
mary_c_df = mary_df[['Name', 'C_ID_t', 'Cousin']]
pat_c_df = pat_df[['Name', 'C_ID_t', 'Cousin']]
#Stack the individual family cousins into a single datafrme for merging into Chicken Tags with GrandParents and Location
cousins_selected_df = pd.concat([barb_c_df, liz_c_df, mary_c_df, pat_c_df], axis=0)
#Uncomment and run to verify Chicken Name and Cousin Name
#cousins_selected_df.head(10)
int_tags_df.columns
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC'],
dtype='object')
int_tags_df[['Name', 'C_ID_t']].head(5)
| Name | C_ID_t | |
|---|---|---|
| 0 | Johan | 35791117.0 |
| 1 | Luca | 988498811.0 |
| 2 | MANDIE | 282876868.0 |
| 3 | MILAGRO | 461050433.0 |
| 4 | TOBI | 29249976.0 |
name_tags_full_df = pd.merge(int_tags_df, cousins_selected_df, how='outer', on='C_ID_t')
name_tags_full_df.columns
Index(['Name_x', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother',
'PH_ID_Chicken', 'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg',
'I_LOC', 'NW', 'Parent_Gen', 'Father', 'C_ID_Father', 'Generation',
'E_ID_Father', 'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM',
'GrandMother', 'C_IDCMM', 'E_IDCMM', 'GrandFather',
'C_ID_Maternal_Father', 'E_ID_Maternal_Father', 'E_IDCM',
'Mother_I_LOC', 'E_ID_FatherIL', 'Father_I_LOC', 'E_ID',
'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Name_y', 'Cousin'],
dtype='object')
name_tags_full_df.rename({'Name_x': 'Name'}, axis=1, inplace=True)
name_tags_full_df[['Name', 'Cousin']].head(5)
| Name | Cousin | |
|---|---|---|
| 0 | Johan | Steven |
| 1 | Luca | Jaylon |
| 2 | MANDIE | Bryce |
| 3 | MILAGRO | DOMINIQUE |
| 4 | TOBI | Jarred |
name_tags_full_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Cousin' ]].head(6)
| Name | Favorite_Song | Mother | Father | GrandMother | GrandFather | I_LOC | Mother_I_LOC | Father_I_LOC | GrandMother_I_LOC | GrandFather_I_LOC | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | ILIANA | Derrick | Liz | Ralph | C | B | B | B | B | Steven |
| 1 | Luca | Women | ILIANA | Bobby | Liz | Ralph | A | B | B | B | B | Jaylon |
| 2 | MANDIE | Mistral Wind | ILIANA | Easton | Liz | Ralph | C | B | B | B | B | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | ILIANA | Billy | Liz | Ralph | B | B | B | B | B | DOMINIQUE |
| 4 | TOBI | Still of the Night | ILIANA | Billy | Liz | Ralph | A | B | B | B | B | Jarred |
| 5 | CATHERIN | My Own Worst Enemy | ILIANA | Billy | Liz | Ralph | B | B | B | B | B | MICHELE |
name_tags_full_df.shape[0]
1050
name_tags_full_df.columns
Index(['Name', 'Favorite_Song', 'C_ID_t', 'E_ID_t', 'Mother', 'PH_ID_Chicken',
'PR_ID_Chicken', 'PH_Name_t', 'PH_ID_Egg', 'PR_ID_Egg', 'I_LOC', 'NW',
'Parent_Gen', 'Father', 'C_ID_Father', 'Generation', 'E_ID_Father',
'PH_Name_Father', 'PH_ID', 'PR_ID', 'NameCMM', 'GrandMother', 'C_IDCMM',
'E_IDCMM', 'GrandFather', 'C_ID_Maternal_Father',
'E_ID_Maternal_Father', 'E_IDCM', 'Mother_I_LOC', 'E_ID_FatherIL',
'Father_I_LOC', 'E_ID', 'GrandMother_I_LOC', 'GrandFather_I_LOC',
'Name_y', 'Cousin'],
dtype='object')
duplicateIDs_nt_df = name_tags_full_df[name_tags_full_df.duplicated(['C_ID_t'], keep=False)]
duplicateIDs_nt_df.to_csv('Coop/duplicated_nt_ids.csv')
### Name Tag clean up to remove any possible duplicates and trim to only 1000 records as that is the number
### defined in the exercise: 1000
#df.drop_duplicates(subset='A', keep="last")
name_tags_full_df.drop_duplicates(subset='Name', keep='first')
| Name | Favorite_Song | C_ID_t | E_ID_t | Mother | PH_ID_Chicken | PR_ID_Chicken | PH_Name_t | PH_ID_Egg | PR_ID_Egg | ... | E_ID_Maternal_Father | E_IDCM | Mother_I_LOC | E_ID_FatherIL | Father_I_LOC | E_ID | GrandMother_I_LOC | GrandFather_I_LOC | Name_y | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | 35791117.0 | 35791117.0 | ILIANA | 598175413.0 | 473017162.0 | ILIANA | 598175413.0 | 473017162.0 | ... | NaN | 598175413 | B | 473017162 | B | 598175413 | B | B | Johan | Steven |
| 1 | Luca | Women | 988498811.0 | 988498811.0 | ILIANA | 598175413.0 | 890606190.0 | ILIANA | 598175413.0 | 890606190.0 | ... | NaN | 598175413 | B | 890606190 | B | 598175413 | B | B | Luca | Jaylon |
| 2 | MANDIE | Mistral Wind | 282876868.0 | 282876868.0 | ILIANA | 598175413.0 | 773595312.0 | ILIANA | 598175413.0 | 773595312.0 | ... | NaN | 598175413 | B | 773595312 | B | 598175413 | B | B | MANDIE | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | 461050433.0 | 461050433.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B | MILAGRO | DOMINIQUE |
| 4 | TOBI | Still of the Night | 29249976.0 | 29249976.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B | TOBI | Jarred |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1043 | TANESHA | Elected | 504981698.0 | 504981698.0 | AKILAH | 135192679.0 | 119329546.0 | AKILAH | 135192679.0 | 119329546.0 | ... | NaN | 135192679 | B | 119329546 | A | 135192679 | B | B | TANESHA | ALVA |
| 1044 | SHANTI | Jungleland | 861338094.0 | 861338094.0 | AKILAH | 135192679.0 | 933690821.0 | AKILAH | 135192679.0 | 933690821.0 | ... | NaN | 135192679 | B | 933690821 | B | 135192679 | B | B | SHANTI | Alden |
| 1045 | Moshe | Top of the World | 375635108.0 | 375635108.0 | AKILAH | 135192679.0 | 862253912.0 | AKILAH | 135192679.0 | 862253912.0 | ... | NaN | 135192679 | B | 862253912 | B | 135192679 | B | B | Moshe | SEASON |
| 1046 | DORTHY | Down On the Corner | 631704795.0 | 631704795.0 | AKILAH | 135192679.0 | 536339090.0 | AKILAH | 135192679.0 | 536339090.0 | ... | NaN | 135192679 | B | 536339090 | C | 135192679 | B | B | DORTHY | SHARITA |
| 1049 | ADELE | The Rain Song | 551433486.0 | 551433486.0 | AKILAH | 135192679.0 | 787976750.0 | AKILAH | 135192679.0 | 787976750.0 | ... | NaN | 135192679 | B | 787976750 | B | 135192679 | B | B | ADELE | BARBERA |
912 rows × 36 columns
name_tags_full_df.to_csv('Coop/name_tags_complete_df.csv')
name_tags_df = name_tags_full_df[['Name', 'Favorite_Song', 'Mother', 'Father', 'GrandMother', 'GrandFather', 'I_LOC', 'Mother_I_LOC', 'Father_I_LOC', 'GrandMother_I_LOC', 'GrandFather_I_LOC', 'Cousin' ]]
name_tags_df
| Name | Favorite_Song | Mother | Father | GrandMother | GrandFather | I_LOC | Mother_I_LOC | Father_I_LOC | GrandMother_I_LOC | GrandFather_I_LOC | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | ILIANA | Derrick | Liz | Ralph | C | B | B | B | B | Steven |
| 1 | Luca | Women | ILIANA | Bobby | Liz | Ralph | A | B | B | B | B | Jaylon |
| 2 | MANDIE | Mistral Wind | ILIANA | Easton | Liz | Ralph | C | B | B | B | B | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | ILIANA | Billy | Liz | Ralph | B | B | B | B | B | DOMINIQUE |
| 4 | TOBI | Still of the Night | ILIANA | Billy | Liz | Ralph | A | B | B | B | B | Jarred |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1045 | Moshe | Top of the World | AKILAH | Nigel | Mary | Ralph | A | B | B | B | B | SEASON |
| 1046 | DORTHY | Down On the Corner | AKILAH | Glenn | Mary | Ralph | B | B | C | B | B | SHARITA |
| 1047 | Misael | Knockin' On Heaven's Door | AKILAH | Ralph | Mary | Ralph | C | B | C | B | B | MARIBETH |
| 1048 | Kamron | Couldn't Stand the Weather | AKILAH | Javon | Mary | Ralph | C | B | A | B | B | REGENA |
| 1049 | ADELE | The Rain Song | AKILAH | Vincent | Mary | Ralph | C | B | B | B | B | BARBERA |
1050 rows × 12 columns
name_tags_full_df.shape[0]
1050
#Per the exercise instruction we will just generate the firt 1000 Tags for the Table.
name_tags_1000_df = name_tags_full_df.head(1000)
#Verify Name Tag Table Shape
name_tags_1000_df.shape
(1000, 36)
name_tags_df.to_csv('Coop/name_tags_final_df.csv')
name_tags_1000_df.to_csv('Coop/name_tags_1000_final_df.csv')
chickens_total_df['Generation'].nunique()
3
chickens_total_df.columns
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
'PH_Name', 'PH_ID', 'PR_ID'],
dtype='object')
print("Checking for any duplicate Chickens in the Table = ", len(chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]))
chickens_total_df[chickens_total_df.duplicated(['C_ID'], keep=False)]
Checking for any duplicate Chickens in the Table = 0
| Name | Sex | Color | C_ID | Generation | Favorite_Song | E_ID | PH_Name | PH_ID | PR_ID |
|---|
## Bonus Assign Egg Date and Hatch Dates to Generation - use to plot
#dailylog_df
#Assume Gen 1 has about 150 days for intial flock starts producing eggs and the eggs hatch
#eggloggen1_df
gen_one_dates_list = dailylog_df[150:]['Dates'].tolist()
gen_two_dates_list = eggloggen1_df[150:]['Dates'].tolist()
#eggloggen1_df[150:]['Dates']
gen_one_lenght = chickens_total_df[chickens_total_df.Generation == 'Gen1'].shape[0]
gen_two_lenght = chickens_total_df[chickens_total_df.Generation == 'Gen2'].shape[0]
#Dataframes:
#chickens_total_df.head(5)
gen_one_length = chickens_total_df.Generation == 'Gen1'
#Assign Nan values to Chickens with Generation of Initial_Flock for hatch_date
chickens_total_df.loc[chickens_total_df.Generation == 'Initial_Flock', 'Hatch_Date'] = np.nan
chickens_total_df.loc[chickens_total_df.Generation == 'Gen1', 'Hatch_Date'] = np.random.choice(gen_one_dates_list, size=gen_one_lenght)
chickens_total_df.loc[chickens_total_df.Generation == 'Gen2', 'Hatch_Date'] = np.random.choice(gen_two_dates_list, size=gen_two_lenght)
#eggs_total_df
chickens_total_df.head(7)
| Name | Sex | Color | C_ID | Generation | Favorite_Song | E_ID | PH_Name | PH_ID | PR_ID | Hatch_Date | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Mary | Hen | Black&White | 225766667 | Initial_Flock | Paradise By The Dashboard Light | NaN | NaN | NaN | NaN | NaN |
| 1 | Pat | Hen | Black&White | 631311759 | Initial_Flock | With a Little Luck | NaN | NaN | NaN | NaN | NaN |
| 2 | Barb | Hen | Black&White | 397591248 | Initial_Flock | Walk On The Wild Side | NaN | NaN | NaN | NaN | NaN |
| 3 | Liz | Hen | Black&White | 630559425 | Initial_Flock | Wild Horses | NaN | NaN | NaN | NaN | NaN |
| 4 | Ralph | Rooster | Black&White | 800981516 | Initial_Flock | Pride of Man | NaN | NaN | NaN | NaN | NaN |
| 1134 | GWENN | Hen | Black&White | 884986770 | Gen1 | Have a Drink On Me | 884986770 | Pat | 631311759 | 800981516 | 2023-02-09 00:00:00 |
| 850 | LOLITA | Hen | Black&White | 744877899 | Gen1 | And She Was | 744877899 | Pat | 631311759 | 800981516 | 2021-02-06 00:00:00 |
chickens_total_df.columns
Index(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'E_ID',
'PH_Name', 'PH_ID', 'PR_ID', 'Hatch_Date'],
dtype='object')
liz_df
| Name | Favorite_Song | C_ID_t | E_ID_t | Mother | PH_ID_Chicken | PR_ID_Chicken | PH_Name_t | PH_ID_Egg | PR_ID_Egg | ... | C_ID_Maternal_Father | E_ID_Maternal_Father | E_IDCM | Mother_I_LOC | E_ID_FatherIL | Father_I_LOC | E_ID | GrandMother_I_LOC | GrandFather_I_LOC | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | 35791117.0 | 35791117.0 | ILIANA | 598175413.0 | 473017162.0 | ILIANA | 598175413.0 | 473017162.0 | ... | 800981516 | NaN | 598175413 | B | 473017162 | B | 598175413 | B | B | Steven |
| 1 | Luca | Women | 988498811.0 | 988498811.0 | ILIANA | 598175413.0 | 890606190.0 | ILIANA | 598175413.0 | 890606190.0 | ... | 800981516 | NaN | 598175413 | B | 890606190 | B | 598175413 | B | B | Jaylon |
| 2 | MANDIE | Mistral Wind | 282876868.0 | 282876868.0 | ILIANA | 598175413.0 | 773595312.0 | ILIANA | 598175413.0 | 773595312.0 | ... | 800981516 | NaN | 598175413 | B | 773595312 | B | 598175413 | B | B | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | 461050433.0 | 461050433.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | 800981516 | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B | DOMINIQUE |
| 4 | TOBI | Still of the Night | 29249976.0 | 29249976.0 | ILIANA | 598175413.0 | 141566545.0 | ILIANA | 598175413.0 | 141566545.0 | ... | 800981516 | NaN | 598175413 | B | 141566545 | B | 598175413 | B | B | Jarred |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1030 | Jermaine | Come to Poppa | 188598569.0 | 188598569.0 | HELLEN | 905132962.0 | 536339090.0 | HELLEN | 905132962.0 | 536339090.0 | ... | 800981516 | NaN | 905132962 | C | 536339090 | C | 905132962 | C | C | Camren |
| 1031 | CATHEY | Friday on My Mind | 414853793.0 | 414853793.0 | HELLEN | 905132962.0 | 879147425.0 | HELLEN | 905132962.0 | 879147425.0 | ... | 800981516 | NaN | 905132962 | C | 879147425 | A | 905132962 | C | C | MAUDIE |
| 1032 | DELIA | It's Still Rock And Roll To Me | 97654270.0 | 97654270.0 | HELLEN | 905132962.0 | 763066813.0 | HELLEN | 905132962.0 | 763066813.0 | ... | 800981516 | NaN | 905132962 | C | 763066813 | A | 905132962 | C | C | LEANORA |
| 1033 | Vincenzo | With a Little Luck | 33155038.0 | 33155038.0 | HELLEN | 905132962.0 | 763066813.0 | HELLEN | 905132962.0 | 763066813.0 | ... | 800981516 | NaN | 905132962 | C | 763066813 | A | 905132962 | C | C | ALVA |
| 1034 | Jared | This Beat Goes on/Switching to Glide | 62652878.0 | 62652878.0 | HELLEN | 905132962.0 | 789856705.0 | HELLEN | 905132962.0 | 789856705.0 | ... | 800981516 | NaN | 905132962 | C | 789856705 | C | 905132962 | C | C | Myles |
319 rows × 35 columns
eggs_total_df.shape[0]
25740
#Remove Eggs that Hatched into Chickens using a left outer join
egg_only_df = pd.merge(eggs_total_df, chickens_total_df, how='outer', on='E_ID', indicator=True).query('_merge=="left_only"')
egg_only_df.shape[0]
24542
#egg_only_df.columns
# Add Egg Dates to Eggs.
#df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
egg_only_df.drop(['Name', 'Sex', 'Color', 'C_ID', 'Generation', 'Favorite_Song', 'PH_Name_y', 'PH_ID_y', 'PR_ID_y', 'Hatch_Date', '_merge'], axis=1, inplace=True)
egg_only_df.head(5)
| E_ID | PH_Name_x | PH_ID_x | PR_ID_x | IH_LOC | NW | Parent_Gen | |
|---|---|---|---|---|---|---|---|
| 0 | 592323119.0 | Mary | 225766667 | 800981516 | C | False | Initial_Flock |
| 1 | 410431189.0 | Mary | 225766667 | 800981516 | C | False | Initial_Flock |
| 2 | 729054692.0 | Mary | 225766667 | 800981516 | B | True | Initial_Flock |
| 3 | 630362478.0 | Mary | 225766667 | 800981516 | B | True | Initial_Flock |
| 4 | 796885249.0 | Mary | 225766667 | 800981516 | A | False | Initial_Flock |
egg_lenght = egg_only_df.shape[0]
egg_only_df['Egg_Date'] = np.random.choice(gen_one_dates_list, size=egg_lenght)
egg_only_df.head(5)
| E_ID | PH_Name_x | PH_ID_x | PR_ID_x | IH_LOC | NW | Parent_Gen | Egg_Date | |
|---|---|---|---|---|---|---|---|---|
| 0 | 592323119.0 | Mary | 225766667 | 800981516 | C | False | Initial_Flock | 2022-06-15 |
| 1 | 410431189.0 | Mary | 225766667 | 800981516 | C | False | Initial_Flock | 2021-11-29 |
| 2 | 729054692.0 | Mary | 225766667 | 800981516 | B | True | Initial_Flock | 2021-08-26 |
| 3 | 630362478.0 | Mary | 225766667 | 800981516 | B | True | Initial_Flock | 2022-03-04 |
| 4 | 796885249.0 | Mary | 225766667 | 800981516 | A | False | Initial_Flock | 2021-10-27 |
#Connect jupysql to DuckDB using a SQLAlchemy-style connection string. Either connect to an in memory DuckDB, or a file backed db.
%sql duckdb:///Coop/coop_farm_db.duckdb
### Convert all the necessary Dataframes to DuckDB database for integration with Metabse
# create the table "my_table" from the DataFrame "my_df"
%sql CREATE TABLE chicken_name_tags as select * from name_tags_df ;
| Count | |
|---|---|
| 0 | 1050 |
# insert into the table "chicken_name_tags" from the DataFrame "name_tags_df"
%sql INSERT INTO chicken_name_tags SELECT * FROM name_tags_df;
| Count | |
|---|---|
| 0 | 1050 |
#Execute a Select query to see all the table data in the chicken_name_tags table
%sql select * from chicken_name_tags;
| Name | Favorite_Song | Mother | Father | GrandMother | GrandFather | I_LOC | Mother_I_LOC | Father_I_LOC | GrandMother_I_LOC | GrandFather_I_LOC | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | ILIANA | Derrick | Liz | Ralph | C | B | B | B | B | Steven |
| 1 | Luca | Women | ILIANA | Bobby | Liz | Ralph | A | B | B | B | B | Jaylon |
| 2 | MANDIE | Mistral Wind | ILIANA | Easton | Liz | Ralph | C | B | B | B | B | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | ILIANA | Billy | Liz | Ralph | B | B | B | B | B | DOMINIQUE |
| 4 | TOBI | Still of the Night | ILIANA | Billy | Liz | Ralph | A | B | B | B | B | Jarred |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2095 | Moshe | Top of the World | AKILAH | Nigel | Mary | Ralph | A | B | B | B | B | SEASON |
| 2096 | DORTHY | Down On the Corner | AKILAH | Glenn | Mary | Ralph | B | B | C | B | B | SHARITA |
| 2097 | Misael | Knockin' On Heaven's Door | AKILAH | Ralph | Mary | Ralph | C | B | C | B | B | MARIBETH |
| 2098 | Kamron | Couldn't Stand the Weather | AKILAH | Javon | Mary | Ralph | C | B | A | B | B | REGENA |
| 2099 | ADELE | The Rain Song | AKILAH | Vincent | Mary | Ralph | C | B | B | B | B | BARBERA |
2100 rows × 12 columns
#Show Tables in the DucDB Database
%sql show tables;
| name | |
|---|---|
| 0 | chicken_name_tags |
#Investigate and import the additional Dataframes in to DuckDB Tables.
#List of Dataframes to ingest into DuckDB Tables
#Songs table dataframe -
dataframes = ['songs_df', 'egg_only_df', 'chickens_total_df','int_tags_df', 'barb_df', 'liz_df', 'mary_df', 'pat_df', 'name_tags_full_df']
#name_tags_df - Already Ingested
dataframes
['songs_df', 'egg_only_df', 'chickens_total_df', 'int_tags_df', 'barb_df', 'liz_df', 'mary_df', 'pat_df', 'name_tags_full_df']
#Loop and Create Table, import and convert to Parquet in Duckdb.duckdb
for dataframe in dataframes:
print("creating table:", dataframe)
%sql CREATE TABLE {dataframe} as select * from {dataframe} ;
%sql INSERT INTO {dataframe} SELECT * FROM {dataframe};
#%sql COPY {table} TO '{table}.parquet' (FORMAT PARQUET);
creating table: songs_df creating table: egg_only_df creating table: chickens_total_df creating table: int_tags_df creating table: barb_df creating table: liz_df creating table: mary_df creating table: pat_df creating table: name_tags_full_df
%sql select * from chicken_name_tags;
| Name | Favorite_Song | Mother | Father | GrandMother | GrandFather | I_LOC | Mother_I_LOC | Father_I_LOC | GrandMother_I_LOC | GrandFather_I_LOC | Cousin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Johan | Gone Gone Gone | ILIANA | Derrick | Liz | Ralph | C | B | B | B | B | Steven |
| 1 | Luca | Women | ILIANA | Bobby | Liz | Ralph | A | B | B | B | B | Jaylon |
| 2 | MANDIE | Mistral Wind | ILIANA | Easton | Liz | Ralph | C | B | B | B | B | Bryce |
| 3 | MILAGRO | Bell Bottom Blues | ILIANA | Billy | Liz | Ralph | B | B | B | B | B | DOMINIQUE |
| 4 | TOBI | Still of the Night | ILIANA | Billy | Liz | Ralph | A | B | B | B | B | Jarred |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2095 | Moshe | Top of the World | AKILAH | Nigel | Mary | Ralph | A | B | B | B | B | SEASON |
| 2096 | DORTHY | Down On the Corner | AKILAH | Glenn | Mary | Ralph | B | B | C | B | B | SHARITA |
| 2097 | Misael | Knockin' On Heaven's Door | AKILAH | Ralph | Mary | Ralph | C | B | C | B | B | MARIBETH |
| 2098 | Kamron | Couldn't Stand the Weather | AKILAH | Javon | Mary | Ralph | C | B | A | B | B | REGENA |
| 2099 | ADELE | The Rain Song | AKILAH | Vincent | Mary | Ralph | C | B | B | B | B | BARBERA |
2100 rows × 12 columns
#After testing - discovered Metabase doesn't currently support the latest version of DuckDB so we export the database as a set
#of Parquet files which allow us to use in the Metabase DuckDB connection as files.d
%sql EXPORT DATABASE 'Coop' (FORMAT PARQUET);